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_ , Abstract 

CN ■ 

CNj ' If modern computers are sometimes superior to humans in some spe- 

cialized tasks such as playing chess or browsing a large database, they 
can't beat the efficiency of biological vision for such simple tasks as rec- 
ognizing and following an object in a complex cluttered background. We 

^ ^ , present in this paper our attempt at outlining the dynamical, parallel and 

event-based representation for vision in the architecture of the central ner- 
vous system. We will illustrate this on static natural images by showing 

y^ . that in a signal matching framework, a L/LN (linear/non-linear) cascade 

^^' may efficiently transform a sensory signal into a neural spiking signal and 

we will apply this framework to a model retina. However, this code gets 
redundant when using an over-complete basis as is necessary for modeling 

CN ' the primary visual cortex: we therefore optimize the efficiency cost by 

increasing the sparseness of the code. This is implemented by propagat- 
ing and canceling redundant information using lateral interactions. We 
compare the efficiency of this representation in terms of compression as 
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_^ , the reconstruction quality as a function of the coding length. This will 

• ' correspond to a modification of the Matching Pursuit algorithm where 

'bj \ the ArgMax function is optimized for competition, or Competition Opti- 

mized Matching Pursuit (COMP). We will in particular focus on bridging 
neuroscience and image processing and on the advantages of such an in- 
terdisciplinary approach. 
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1 Introduction: efficient neural representations 

The architecture of modern day computers illustrate how we understand in- 
telligence. But, if they are good at playing chess or at browsing databases, 
it is clear that computers are far from rivaling with what appears to be more 
simple aspects of intelligence such as the ones demonstrated in vision. Think 
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for instance as something as simple as recognizing an object in natural condi- 
tions, such as while walking in the street. This necessarily involves a network 
of processes from segmenting its outline, perceiving its global motion, matching 
its different patterns invariantly to the shading, contrast, angle of view or to 
occlusions. Actually, while this seems obvious to us, computers cannot perform 
this task and it is a common practical "Turing Test" to authenticate humans 
versus spamming robots by challenging the login upon recognizing for instance 
warped letters on a noisy background (the so-called CapTchas). 
As the seat of this processing, the Central Ner vous System (CNS) i s therefore 
by its efhciency clearly different from a classical I von NeumannI ( 19661 ) computer 



defined as a sequential Turing-like machine with a few, very rapid Central Pro- 
cessing Units and a finite, adressable memory. Computational Neuroscience 
is a branch of neuroscience studying specifically the structure and function of 
co mputations in t he CN S such as the more complex architectures imagined 
by Ivon NeumannI pOOOT ). Numerous successful theories exist to explain the 



complex dynamics of modern Artificial Neural Networks and how we rnay use 



neuro-physiological constraints to build up efficient svstem s (iGrossberd . 120031 ) 



that are ecologically adapted to the statistics of the input ( Atickl . ll992 ). How- 
ever, a main challenge involving both neuroscience and computer science is to 
understand how and for what class of problems the CNS outperforms tradi- 
tional computers. I am interested in this paper in extracting general principles 
from the structure of the CNS to derive a better understanding of the neural 
functions but also to apply these algorithms to signal processing applications. 
A fundamental difference of the CNS is the fact that 1) information is distributed 
in parallel on the different neurons, 2) processes are dynamical and interrupt- 
ible, 3) information is carried by elementary events, called spikes which may be 
transmitted over long distances. This is well illustrated for the large class of 
pyramidal neurons of the neocortex. In a simplistic way the more a neuron is 
excited, the quicker and the more often it will emit spikes, with a typical latency 
of some milliseconds and a maximum firing frequency of the order of 200 ms. 
Concentrating on local cortical areas (that is in human to the order of some 
squared centimeters and to a billion neurons), it means that the complexity of 
some operation will be different on a computer (a few but very rapid CPUs) 
and a population of neurons (a huge number of slow dynamical event genera- 
tors). For instance, the complexity of the ArgMax operator (finding the sorted 
indices from a vector) will increase as 0{Nlog{N)) with the dimension N of the 
vector, while if we apply the vector as the activation of a neuronal population, 
the complexity will not increase with the number N of neuron^j. In addition, 
the result is given by the generated spike list and is interruptible. 
In this paper, we will explore how we may apply this class of operators to 
the processing of natural images by presenting an adaptive Linear/Non-Linear 
framework and then optimize its efficiency. We will in a first step draw a ra- 
tionale for using a linear representation by linking it to a probabilistic repre- 
sentaiton under the condition of decorrelation. Then we will derive a linear 
transform adapted to natural images by con structing a simple pyramidal archi- 
tecture similar to ( Burt and Adelsonl 1983 ) and extend it to a Laplacian and 



Log-Gabor pyramids (JFischer et al.l . 120051 ) . We will then in a third section pro- 



^Note that in a noisy environment, the output will be given with a certain temporal 
precision and that this precision may decrease with N. 



pose that this hnear information may be optimally coded by a spike list if we 
apply a point non- linear operation. A t least , we will define an improvment over 
Matching Pursuit ( Mallat and Zhang . 1X9931 ) by optimizing the efficiency of the 



ArgMax operator arid whic h finally defines Sparse Spike Coding (|PerrinetLl2004L 
20071 : IPerrinet et al.l . l20o3 ). 



2 Linear filtering and whitening 

A first step in the definition of this algorithm is to explicit the linear operations 
which are used to transform the input vector into a value representative of the 
quality of a match. Let's define an image as a set of scalar values Xi on a set 
of positions P, i being the index of the positions, so tha t it defines a v ector 



X G M*^, with M = card{V). As we saw in previous works (jPerrinetl . l2004l ). the 
quality of a match between the raw data x with a known image may be linked in 
a probabilistic framework to the correlation coefficient. In fact, the probability 
of the signal x knowing the "shape" h of the signal to find (see the table Tab. [2] 
for the chosen notation) is: 

P{h\i) = ^^P(i\h)P{h) 



' ' .exp(^(^-^)^""(^"^)^.PW 



P{x) (27r)*^/2 



(1) 



This is based on the assumption of centered data (that is E{x) = 0), a Linear 
Generative Model and a gaussian noise of covariance matrix S = E{xx^) (See 
Chapter 2.1.4 of ( Perrineti . 120071 )). In the case where the noise is white (that 



is that the covariance matrix is a diagonal matrix) and assuming an uniform 
prior for the scalar value of h, this may be simply computed with the correlation 
coefficient defined by: 

h X dcf Y,l<i<M ^i^i /„x 



l<i<M -^i 

It should be noted that pj is the Af ^''-dimensional cosinus and that its absolute 
value is therefore bounded by 1. The value of ArcCos(pj) would therefore give 
the angle of x with the pattern h and in particular, the angle would be equal 
(modulo 27r) to zero if and only if pj = 1 (full correlation), n if and only if 
Pj — ~l (full anti-correlation) and ±7r/2 ii pj = (both vectors are orthogonal, 
there is no correlation). Also, it is independent to the norm of the filters and 
we assume without loss of generality in the rest that these are normalized to 
unity. To achieve this condition, the raw data x has to be preprocessed with a 
decorrelation filter to achieve a signal x with no mean point-wise correlatior|3. 
To define this, we may use for instance the eigenvalue decomposition (EVD) of 
the covariance matrix: 

S = VDV^ 

(3) 



^Of course, this docs not achieve necessarily independence as is often stated. 





Figure 1: Spatial decor relation. (Top-Left) Sample raw natural image {M — 
512^). (Bottom-Left) Mean pairwise spatial correlation in a set of 1000 natural 
images (Red is 1, blue is zero). It shows the typical decrease in -^ of the power 
spectrum but also an anisotropy along the vertical and horizont al axis. (Middle) 
decorrelation filter computed from the methods of |AticlJ . ll992l ) (see text). This 
profile is similar to the interaction profile of bipolar and horizontal cells in the 
retina. (Top-Right) Whitening of the sample image. (Bottom- Right) The mean 
pairwise spatial correlation of 1000 whitened natural images is highly peaked at 
the origin and inferior to 0.05 elsewhere. As is observed i n the LGN , the p ower 
spectrum is relatively decorrelated by our pre-processing (JDan et al.1 . 119961 ). See 
script experiment_whitening.py to reproduce the figure. 



where V is a rotation (and thus V^^ = V-'") and D is a diagonal matrix. 
This decomposition is similar to that achieved by PCA and may be computed 
for instance byayeraging linear correlations such as is done with the linear 
Hebbian rule (jOjal . Il982l ) . In particular, the columns of matrix V contain the 
eigenvectors and D is a diagonal matrix of the corresponding eigenvalues. If we 
set W = D^2 V"^ and x — Wx, then 



E{xx^ 



E{^Ni{^NS:f) 

D 5V^SVD 5 

D 5V^VDV^VD 5 

1 AfxM 



We therefore proved that this linear transform de-correlates on average the input 
data. In practice, we used the power spectrum and its relation to the covariance 
in tra nslation inva riant data such as natural images to compute the whitening 
filter ( Atickl . Il992f) . This corresponds then to a filter with a gain proportional 



to the spatial frequency but with an anisotropy on the vertical and horizontal 
axis (see Fig. [T]). 

Thanks to this processing, and only when these hypothesis have been fuUfilled, 
we may in general use the correlation coefficient (see Eq. [5]) as a measure related 
to the probability of a match of the image with a given pattern. The next step 
is now to define the best patterns to represent images. 



Table 1: Matrix notation and dcnoising Variables 



Name 


Symbol 


Description 


Pixel positions 


V 


p e P,card{P) = M 


Raw image 


X 


i e R", E{i) = 


Covariance matrix 


S 


S e ^M-xM 


Whitening matrix 


w 


W e M*^xA-^ 


Decorrelated image 


X 


x^Wxe M^'^ 


Pattern image 


h, 


hj eM*^j e V 


Overcomplete dictionary 


V 


card{V) = N :$> M 


Decorrelated pattern image 


h, 


hj = Whj e M^ 


Transform matrix 


H 


H e K^><*^ 


Correlation coefficient 


Pj 


pj^ \\h;\M e i I'l] 



3 Multiscale representations: the (Golden) Lapla- 
cian Pyramid 

Multi-scale representations are a popular method to allow for a scale invariant 
representation. This correspond to repeating basic shapes at different scales and 
it thus allows that one may easily compute the representation of a scaled image 
by a simple transformation in the representation space instead of recomputing 
the whole transform. As a consequence, this representation makes it for instance 



easier to compute the match of a feature at different scales. It is classicahy 
implemented in wavelet transforms but we present he re a simple implementatio n 
using a recurrent scheme, the Laplacian Pyramid ( Burt and Adelsonl Il983r ). 



This transform has indeed the advantage of being computed by simple down- 
scaling and up-scaling operations and is easily inverted for the reconstruction 
of the image. It transforms an image in a list of down-scaled images, or image 
pyramid. Let's define the list {M'^} with < fc < s of the sizes of the down- 
scaled images (fc = corresponds to the "base" and M° = M while s is the 
level of the smallest image, that is the summit of the pyramid). Typically, such 
as in wavelets, the size decreases geometrically with an exponent 7. The most 
used exponent in image processing is 2, the pyramid is then called dyadic. The 
corresponding down-scale and up-scale transform from level fc to fc -I- 1 may 
be defined as Vk and Uk respectively. We may therefore define the gaussian 
pyramid as the recursive transform from the "base" of the pyramid to the top 
as the list of transforms: 

g ^{V'^jwithV'' =Voo---oVk (4) 

This means that a down-scaled version of the image V'^x may be obtained 
by applying all down-scaling transforms sequentially from the base to level fc. 
If the elementary operators are linear, the G transform is linear. The cor- 
respon ding filters correspond a pproximately to gaussians with increasing ra- 



diuses (jBurt and Adelsonl . 119831 ) and the images in the pyramid thus correspond 
to progressively more blurred versions of the "base" image. This transform is 
usually very fast and is very likely to be implemented by the extended dendritic 
arbor of neuronq^. 

The Laplacian Pyramid is defined from the Gaussian Pyramid as the pyramid 
of images constituted by the residual between the image at one scale and the 
up-scaled image from the upper level. It is therefore mathematically defined as: 

C = {V'' - (Uk o P'=+i)} with < fc < s (5) 

by defining for clarity that I?° = 1 and V^^^ = 0. This transform is still linear 
that is that Va;, Vy, VA, C{x + y) = Cx + Ly and C(\x) = XCx. Since every level 
corresponds to the residual, it is easy to invert. In fact, if we write as CkX the 
image at level fc and U'' =Un o ■ ■ ■ oUk, then Vx, 



J2 ^"^kx = J2 ^'p'-(Wfe°2?"+'))a 



0<k<s Q<k<s 



= J2 U^V'^x- Y^ U^UkoV^+^X 
0<k<s 0<k<s 

= Y^ U^V'^x- Y U^^'D^x^x (6) 

0<fc<s l<fc<s+l 

Therefore the inverse of the Laplacian Pyramid transform is defined as: 

£-1 = Y. ^'^fe (7) 

a<k<s 



^Note however that in vertebrates, the retinal representation the preferred spatial frequency 
grows with eccentricity. 



Figure 2: The Golden Laplacian Pyramid. To represent the edges of the im- 
age at different levels, we may use a simple recursive approach constructing 
progressively a set of images of decreasing sizes, from a base to the summit 
of a "pyramid" . Using simple down-scaling and up-scaling operators we may 
approximate well a Laplacian operator. This is represented here by stacking 
images on a "Golden Rectangle", that is where the aspect ratio is the golden 
section (f> = ^^2 • ^^ present here the base image on the left and the succes- 
sive levels of the pyramid in a clockwise fashion (for clarity, we stopped at level 
8). Note that here we also use 0^ (that is -I- 1) as the down-scaling factor so 
that the resolution of the pyramid images correspond across scales. Note at last 
that coefficient are very kurtotic: most are near zero, the distribution of coef- 
ficients has "long tails". See script experiment_SpikeCoding.py to reproduce 
the figure. 



The filters corresponding to the different levels of the pyramid (and which are 
the inverse image of a Dirac pyramid by L~^) are similar to difference of gaus- 
sians (because they are the difference of two successive levels of the Gaussian 
Pyramid). The exponent 7 will therefore play the important role of the ratio 
of the the radiuses of the Gaussians. We choose here the exponent to be equal 
to the golden number 7 = 0= -'- ^ ^ w 1.618033 for two reasons. First, it 
corresponds to a value which approximates well a Laplacian-of-Gaussians with 
a Difference of Gaussians as is implemented here. Second, it allows to construct 
a natural representation of the whole pyramid in a full Golden Rectangle (see 
Fig. [5]) where the resolution of each image will be constant. 
Note the following properties of the pyramid: 



the over-completeness is equal to ^ 



0<A;<s 'y 



^Z'^ 



1-7- 



so that it is equal 

to i_i-'i = A,-Z,-i = 4' which is indeed the area of the Golden Rectangle 
compared to the area of the image. It is slightly higher than for a dyadic 
j^ ^ I « 1.333 < 0). 



pyramid (indeed 



• since this linear transform is over-complete, there may exist non zero pyra- 
mids which inverse image is null (that is 3L 7^ such that C^^L = 0) but 
this pyramids are not accessible from any non-null image. 

• one may also implement a simple "Golden Pyramid" using the Fourier 
transform, and one may observe that in both cases, the filters corresponds 
to localized filters in the frequency space. The whitening (see Sec. [5]) 
has an approximately scalar effect that corresponds to an equalization of 
the variances of the coefficients to natural images at the different spatial 
frequencies. 

• Finally, once the obtained filters are normalized, the coefficients will cor- 
respond to the correlation coefficients of the image with edge detectors at 
different scales as defined in Eq. [21 The coefficients will therefore as in 
wav elet analysis co r respo nd to the local Lipschitz coefficients of the im- 



age (jPerrinet et al.l . l2004l ) . When ordered by decreasing absolute values 



they will correspond to features of decreasing singularities, from a pure 
singularity, to a smooth transition (as a ramp of luminosity) . 



Table 2: Notations used for the Laplacian Pyramid 



Name 


Symbol 


Description 


sizes of the down-scaled images 


{An 


0< fc < s 


Down-scale operator 


Vk 


from level fc to fc + 1 


Up-scale operator 


Uk 


from level fc to fc + 1 


Full Down-scale operator 


pfc 


V° = 1 and V'+^ = 


Full Up-scale operator 


w 




Gaussian Pyramid 


Q 




Laplacian Pyramid 


C 


C = {Ck} with < fc < s 



4 Spike Coding 

Now that we defined a linear transform which is suitable for natural images by 
associating the whitening filters and the Laplacian Pyramid, we wish to transmit 
this information efficiently using neurons. As we saw in the previous section, 
the higher coefficients correspond to more singular features and therefore to 
more informative content. By using Integrate-and-Fire neurons, it is therefore 
natural that we may associate to every coefficient of the pyramid applied to the 
image a single neuron. For the linear Leaky-IF, if we associate a driving current 
to each value p, (with < 7 < A^, as noted in Tab. [2]) it will will elicit spikes 



with latencies (jPerrinet et al.l . l2004l ): 



Aj=Tlog ^ (8) 

where r is the characteristic time constant, 9 is the neuron's threshold and gj is 
a monotonously increasing function of pj corresponding to the transformation of 
the linear value into the driving current. By this architecture, since the relation 
in Eq. [5]is monotonously increasing, one implements a simple ArgMax operator 
where the output is the index of the neurons corresponding to the ordered list 
of output spikes. 

However, one may observe that for some linear transforms, the distribution of 
correlation coefficients may be not similar for all j. This is contradictory with 
the fact that spikes are similar across the CNS since it would mean that the 
probability of the coefficient underlying the emission of a spike is not uniform. 
To optimize the efficiency of the ArgMax operator, one has therefore to ensure 
that one optimizes the entropy of the index of output spikes and therefore of 
the driving current. This may be ensured by modifying the functions gj so that: 

1. for all J, the distributions of gj{pj) are similar, 

2. allow that this overall distribution has a shape adapted to the spiking 
mechanism (for instance by using Eq. [5]) . 

The second point — finding a global non-linearity g — will be out of scope of 
this paper, and we will for the sake of generality only ensure that we find func- 
tions fj (with gj —go fj) such that the variables Zj — fj{pj) are uniformly 
distributed. 
This condition is easily performed by operating a point non-l inearity on t he dif- 



ferent variables pj based on the statistics of natural images (|Atickl . ll992f ). This 
method is similar to histogram equalization in image processing and provides an 
output with maximum entropy for a bounded output: it therefore optimizes the 



codin g efficiency of the representation in terms of compression (Ivan Haterenl . 
1993 ) or dually the minimization of intrinsic noise ( Srinivasan et al.l . ll982f) . It 



may be easily derived from the probability P of variable Pj (bounded in absolute 
value by 1) by choosing the non-linearity as the cumulative function 

MPj) = /_'' dPip) (9) 

where the symbol dP{x) = Px{x)dx will here denote in general the probability 
distribution function (pdf) for the random variable X. This process has been 



observed in a variety of spec ies and is for instance perfectly illustrated in the 
salamander (Laughlinl. Il981r ). It may evolve dynamically to slowly adapt to 



varying changes in luminances, such as when the ligh t diminishes a t dawn but 
also to some more elaborated schemes within a map ( Hosova et all 120051 ) . As 



in "ideal democracies" where all neurons are "equal", this process has to be 
dynamically updated over some characteristic period so as to achieve optimum 
balance. As a consequence, since for all j, the pdf of Zj — fj{pj) is uniform and 
that sources are independent, it may be considered as a random vector drawn 
from an uniform distribution in [0, 1]. Knowing the different spike generation 
mechanisms which are similar in that class of neurons, every vector {pj} will thus 
generate a list of spikes {j(l), j(2), . . .} (with corresponding latencies) where no 
information is carried a priori in the latency pattern but all is in the relative 
timing across neurons. 

We coded the signal in a spike volley, but how can this spike list be "decoded" , 
especially if it is conducted over some distance and therefore with an additional 
latency? In the case of transient signals, since we coded the vector {pj } using 
the homeostatic constraint from Eq. [31 we may retrieve the analog values from 
the order of firing neurons in the spike list. In fact, knowing the "address" of the 
fiber j(l) corresponding to the first spike to arrive at the receiver end, we may 
infer that it has been produced by a value in the highest quantile of P(/5j(i)) on 
the emitting side. We may therefore decode the corresponding value with the 
best estimate Pjm — f~(i){ji) where N is the total number of neurons. This 

is also true for the following spikes and if we write as Zj(fe) — -^ the relative 
rank of the spike (that is neuron j{k) fired at rank fc), we can reconstruct the 
corresponding value as 

Pj(fc) =/i(fe)(l-^j(fc)) (10) 

This corresponds to a generalized rank coding scheme ( Perrinetill999t|Perrinet et al 



20011 ). First, it loses the information on the absolute latency of the spike train 
which is giving the maximal value of the input vector. This has the partic- 
ular advantage of making this code invariant to contrast (up to a fixed de- 
lay due to the precision loss induced by noise). Second, when normalized by 
the maximal value, it is a first order approximation of the vector which is es- 
pecially relevant for over-complete representations where the information con- 
tained in the rank vector (which is thanks to Stirling's approximation of order 
log2(A^!) = 0{N. log(A^)), that is more than 2000 bits for 256 neurons) is CTeater 
than the information contained in the particular quantization of the imagcl- On 
a practical note, we may use the fact that the inverse of fj may be computed 
from the mean over trials of the function of the absolute functions as a function 
of the rank. 

This code therefore focuses on the particular sequence of neurons that were 
chosen and loses the particular information that may be coded in the pattern 
of individual inter-spike intervals in the assembly. A model accounting for the 
exact spiking mechanism would correct this information loss, but this would be 
at the cost of introducing new parameters (hence new information), while it 
seems t hat this information would have a low impact relative to the total infor- 
mation ( Panzeri et al.l . llQQOl ) . More generally, one could use different mappings 



for the transformation of the z value into the a spike volley which can be more 



*We are generally unable to detect quantization errors on an image consisting of more 256 
gray levels, that is for 8 bits. 
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Figure 3: Spike Coding of natural images. We did build here a simple framework 
of pyramidal neurons illustrating the efficiency of neural architectures compared 
to cla ssical computer architectures . We show here how a bundle of L-NL neu- 
rons (jCarandini et al.l 119971 120051 ) tuned by a simple homeostatic mechanism 
allow to transfer a transient information, such as an image, using spikes. (L) 
The signal to be coded, for instance the match pj of an image patch (the tiger 
on the left bottom) with a set of filters (edge-like images), may be considered as 
a stochastic vector defined by the probability distribution function (pdf ) of the 
values Pj to be represented. (NL) By using the cumulative function as a point 
non-linearity fj, one ensures that the probability of Zj = fjifj) is uniform, that 
is that the entropy is maximal. This non-linearity in the L-NL neuron imple- 
ments a homeostasis that is controlled only by the time constant with which the 
cumulative probability function fj is computed (typically 10^ image patches in 
our case). (S) Any instance of the signal may then be coded by a volley of spikes: 
a higher value corresponds to a shorter latency and a higher frequency. (D) In- 
versely, for any spike events vector, one may estimate the value from the firing 
frequency, the latency. We may simply use the ordering of the spikes since the 
rank provides an estimate of the quantile in the probability distribution function 
thanks to the equalization. Using the inverse of fj one retrieves the value in 
feature space so that this volley of spikes is decoded (or directly transformed) 
thanks to the relative timing of the spikes using the modulation (see Eq. [TU)) . 
This builds a robust information channel where information is solely carried by 
spikes as binary events. Given this model, the goal of this work is to find the 
most efficient architecture to code natural images and in particular to define 
a coding cost and to derive efficient compression algorithms. Note that this 
scheme is similar to the N-NL scheme but that instead of generating a Pois- 
son point process, we use the the exact timing. This is allowed by the point 
non-linearity which permits to code the value by the timing and not the firing 
frequency. 
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adapted to continuous flows, but this scheme corresponds to an extreme case 
(a tr ansient signal) which is useful to stress on the dynamical part of the cod- 
ing (jvan Rullen and Thorpd . 120011 ) and is mathematically more tractable. In 
particular, one may sho w that the coding erro r is proportional to the variability 
of the sorted coefficients ( Perrinet et al.Ll2004 ). the rest of the information being 
the information coded in the time intervals between two successive spikes. Thus, 
the eflBciency of information transmission will directly depend on the validity 
of the hypothesis of independence of the choice of components and therefore on 
the statistical model build by the LGM. 

It should be also noted that no explicit reconstruction is necessary (in the math- 
ematical sense of the term) on the receiver side as we do here, since the goal 
of the receiver could only be to manipulate information on for instance some 
subset on the spike list (that is on some receptive field covering a subpart of 
the population). In simple terms, there is no reason to have a reconstruction of 
the image in the CNS. In particular one may imagine that we may add some 
arbitrary global point linearity to the z values in order to threshold low values 
or to quantize values (for instance set all values to 1 only for the first 10% of 
the spikes). However, this full reconstruction scheme is a general framework for 
information transmission, and we may then imagine that if for instance we pool 
information over a limited receptive field, the information needed (the ranks in 
the sub-spikelist) will still be available to the receiver directly without having 
to compute the full set (in fact, since the pdf of z is uniform, the pdf of a subset 
of components of z is also uniform) . 



5 Sparse Spike Coding 



However, as we described before (|PerrinetJ . 12004 l2007t IPerrinet et al.l . 12003), 
if we use over-complete dictionaries of filters, the resulting spiking code gets 
redundant. In fact, unless the dictionary is orthogonal, when choosing one 
component over an other, any choice may modify the choice of the other com- 
ponents. If we chose the successive neurons with maximum correlation values, 
the resulting representation will be proportionally more redundant when the dic- 
tionary gets more over-complete. Also, we saw tha t optimizing the choice leads 
then to a combinatorial explosion ( Perrinetl 120081 ). To solve this NP-complete 
problem to model realistic representations such as when modeling the primary 
visual cortex, one may implement a solut ion designed after the rich l y late rally 
connected architecture of cortical layers ( Fischer et al.l . l2005l 120061 12007^ . In 
fact, an important part of cortical areas consists of a lateral network propa- 
gating information in parallel between neurons. We will here propose that the 
NP-problem can be approximately solved by using a cross-correlation based in- 
hibition between neurons. 



In fac t, as was first proposed in the Sparse Spike Coding (SSC) algorithm (Perr inet et al 



20021 ). one could use a greedy algorithm on the Ln-norm cos t and that these led 
to use of Matching Pursuit algorithm (JMallat and Zhand . Il993[ ). More gen- 
erally, let's first define Weighted Matching Pursuit (WMP) by introducing a 
non-linearity in the choice step. Like Matching Pursuit, it is based on two 
repetitive steps. First, given the signal x, we are searching for the single source 
s*t..hj' that corresponds to the maximum a posteriori (MAP) realization for 
X (see Eq. [2]) transformed by a point non-linearity fj . This Matching step is 
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defined by: 

r=ArgMax^.[/,(p,)] (11) 

where fj{.) is some gain function that we will describe below and which may be 
set initially to strictly increasing functions and pj is initialized by Eq. [21 In a 
second step (Pursuit), the information is fed-back to correlated sources through 

a; <— a; — s*. .hj* (12) 

where s*, is the scalar projection < Xjhj* >. Equivalently, from the linearity 
of the scalar product, we may propagate laterally: 

< X, hj >^< X, hj > — < X, hj* >< hj<- , hj > (13) 

that is from Eq. [2 

Pj ^ Pj-Pr < hj',hj > (14) 

For any set of monotonously increasing functions /_,■ , WMP shares many prop- 
erties with MP, such as the monotonous decrease of the error or the exponential 
convergence of the coding. The algorithm is then iterated with Eg. llll until some 
stopping criteria is reached. The signal may be reconstructed from the spike 
list as a; = '^ Pj{k)hj(k)i where Pj(k) is the value reconstructed using Eq. llOl We 
then define Competition Optimized Matching Pursuit (COMP) as WMP where 
the point non-linearities are defined by Eq. [5] and Sparse Spike Coding (SSC) is 
then defined as the spik e coding/decod ing algorithm which uses COMP as the 
coder. As described in ( Perrineti 12004 ). while the Matching step is efficiently 



performed by the LIE neurons driven by the NL input, the pursuit step could be 
implemented in a cortical area by a correlation-based inhibition. This type of in- 
hibition is typical of fast-spiking interneurons though there is no direct evidence 
of this activity-based synaptic topology. It will correspond to a lateral interac- 
tion within the linear (L) neuronal population. In practice, the fj functions are 
initialized for all neurons to the identity function (that is to a MP algorithm) 
and then evaluated using an online stochastic algorithm with a "learning" pa- 
rameter corresponding to a smooth average which effect was controlled. As a 
matter of fact, this algorithm is circular since the choice of s is non- linear and 
depends on the choice of fj. However, thanks to the exponential convergence 
of MP, for any set of components, the fj will converge to the correct non-linear 
functions as defined by Eq. [H This scheme extends the Matching Pursuit (MP) 
algorithm by linking it to a statistical model which tunes optimally the match- 
ing step (in the sense that all choices are statistically equally probable) thanks 
to the adaptive point linearity. In fact, as stated before, thanks to the uniform 
distribution of the choice of a component, one maximizes the entropy of every 
match and therefore of the computational power of the ArgMax operator. Think 
a contrario to a totally unbalanced network where the match will be always a 
given neuron: the spikes are totally predictable and the information carried by 
the spike list then drops to zero. It therefore optimizes the efficiency of MP for 
the Sparse Spike Coding problem (see Fig. [3]). 

Extensions of this type of event-based algorithms are multiple. First, It extends 
naturally to the temporal domain. In fact, we restricted us ourselves here to 
static flashed im ages, but is easily extendable to causal filters (see Ch. 3.4.1 in 
( Perrined . l2007l )). It however raises the unsolved problem of a dynamical com- 



promise between precision and rapidity of the code which is still unanswered. 
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Figure 4: Efficiency of Competition Optimized Matching Pursuit (COMP). 
Spike Coding and Sparse Spike Coding (using COMP) produce flows of spikes 
representing the image. By representing the the distance of the original image 
with a reconstruction, one may quantify the dynamical efficiency of this solution 
as a function of the number of spikes. (Left) When applying the algorithm on a 
set of natural images, the coefficients exhibited differences in their probability 
density functions. We show this by plotting the cumulative density functions 
of the coefficients for different levels in the pyramid. Using these cumulative 
pdf, one could transform the pyramids of coefficients in pyramids for which all 
coefficients where a priori equiprobable. This optimizes the ArgMax operator 
which is at the heart of the Sparse Spike Coding scheme. (Right) The resulting 
COMP solution gives a similar result than MP in terms of residual energy as a 
function of pure Lq sparseness (see inset). In fact, in MP, by taking the max- 
imum absolute, and since the decrease o f energy i s prop ortional to the square 
of the coefficient (see Chapter 3.1.2 of ( Perrinetl . 120071 )) one ensures that the 
decrease of MSE per coefficient is optimal for MP. These are both better for 
that purpose than conjugate gradient. However, when defining the efficiency 
in terms of the residual energy as a function of the description length of the 
spiking code word, then the proposed COMP model is more efficient than MP 
because of the quantization errors inherent to the higher variability of coded 
coefficients. Thus, including homeostasis improved the efficiency of adaptive 
Sparse Spike Coding by ensuring that the decrease of MSE per bit of code is op- 
timal. It should be noted that the homeostasis mechanism is important during 
"learning" but that it is not useful for "pure" coding (see Sec. [S]). 



14 



It may also be extended i n a a daptive code, showing the emergence of Vl-like 
receptive fields (jPerrinetl . 120081 ). At last, using in these sparse representations 



of long-range interactions such as those present in the primary visual cortex 
should prove to be very helpful to resolve generic image processing problems 
such as denoising. 

Reproducible science / Acknowledgments 
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